{:.no_toc}
The nucleolus is a prominent structure of the nucleus of eukaryotic cells and is involved in ribosome biogenesis and cell cycle regulation. In DNA staining of cells, nucleoli can be identified as the absence of DNA in nuclei (Fig. 1).
Phenotypes caused by reduced gene function are widely used to elucidate gene function and image-based RNA interference (RNAi) screens are routinely used to find and characterize genes involved in a particular biological process. While screens typically focus on one biological process of interest, the molecular markers used can also inform on other processes. Re-using published screens image data can then be a cost-effective alternative to performing new experiments.
In particular, regardless of the targeted biological process, many screens include a DNA label and therefore can also reveal the effect of gene knock-downs on nucleoli.
In this project, we will analyze DNA channel images of publicly available RNAi screens to extract numerical descriptors (i.e. features) of nucleoli. The images and associated metadata will be retrieved from the Image Data Resource (IDR){:target="_blank"}, a repository that collects image datasets of tissues and cells.
To process and analyze the images, we will use CellProfiler, a popular image analysis software. CellProfiler normally comes as a desktop application in which users can compose image analysis workflows from a series of modules. These modules are now also available as tools in Galaxy.
To fully emulate the behaviour of the standalone CellProfiler in Galaxy, each image analysis workflow needs to have three parts:
StartingModules {% icon tool %} to initialise the pipeline,
tools performing the analysis (Fig. 2): identification of the background, nuclei and nucleoli, and feature extraction,
CellProfiler {% icon tool %} to actually run the pipeline.
Here you will learn how to create a workflow to download a selection of images from the IDR; segment nuclei and then the nucleoli within the nuclei using CellProfiler. You will also learn how to extract and export features at three different levels: image, nucleus, nucleolus.
Agenda
In this tutorial, we will cover:
- TOC {:toc}
{: .agenda}
{% icon hands_on %} Hands-on: Download images from the IDR
If you are logged in, create a new history for this tutorial.
{% include snippets/create_new_history.md %}- IDR Download {% icon tool %} with the following parameters:
- “How would you like to specify the IDs of images to download?”:
As text (comma-separated list of IDs or a valid IDR link)- “Image IDs to download”:
http://idr.openmicroscopy.org/webclient/?show=image-295900|image-295905|image-295910|image-295918|image-295928|image-295934- “Name of the channel to download”:
Cy3- “z-plane of images to download”:
0- “Image frame to download”:
0- “Limit the download to a selected region of the image?”:
No, download the entire image plane- “Skip failed retrievals?”:
Yes- “Download images in a tarball?”:
Yes{% icon tip %} Tip: Get the IDR link from a manual selection of images
To get the valid IDR link, go to the dataset of interest in the IDR{:target="_blank"} and select in the preview of a plate a few images ((figure Fig. 3 - 1)). Once you see them at the bottom of the page (figure Fig. 3 - 2), select them again and click the link button in the top-right corner of the right panel (figure Fig. 3 - 3).
{: .tip}
{% icon comment %} Comment
IMPORTANT: When the number of images to download is high, it is recommended to enable the option “Download images in a tarball?” in order to improve the performance. {: .comment}
{: .hands_on}
{% icon question %} Question
- Why are we taking the
Cy3channel in the example data{:target="_blank"}?- How could we download 100,000 images in one go?
{% icon solution %} Solution
The
Cy3dye was used in the study to stain DNA, and since we want to segment the abscence of DNA, that’s the only channel that we need to download from the IDR.We could upload a text file with the image ids of interest.
{: .solution}
{: .question}
The tool Starting Modules {% icon tool %} comprises the first 4 modules of the standalone CellProfiler. It has to be used at the beginning of a workflow because it sets the naming and metadata handling for the rest of tools. > ### {% icon hands_on %} Hands-on: Specify metadata to CellProfiler > > 1. Starting Modules {% icon tool %} with the following parameters: > - Images > - “Do you want to filter only the images?”: Select the images only > - Metadata > - “Do you want to extract the metadata?”: Yes, specify metadata > - “Metadata extraction method”: Extract from file/folder names > - “Metadata source”: File name > - “Select the pattern to extract metadata from the file name”: field1__field2__field3__field4__field5__field6 > - “Extract metadata from”: All images > - NamesAndTypes > - “Process 3D”: No, do not process 3D data > - “Assign a name to”: Give every image the same name > - “Name to assign these images”: DNA > - “Select the image type”: Grayscale image > - “Set intensity range from”: Image metadata > - Groups > - “Do you want to group your images?”: Yes, group the images > - “param”: field1 > > > > > ### {% icon comment %} Comment > > > > The images downloaded from the IDR are named following the pattern: plate__imageID__cropX__cropY__cropWidth__cropHeight. These fields indicate to which plate the image belongs, what is the identifier of the image in the IDR, and the 4 cropping parameters selected. In our case, the upper-left corner (X, Y) and the width and height from there. We have a total of 6 metadata values encoded in the name of the file, separated by __. The pattern to extract our metadata from the file name properly is, therefore: field1__field2__field3__field4__field5__field6. It is important to keep in mind that, later in the analysis, our plate will be field1, imageID will be field2, cropX will be field3, etc. for CellProfiler. > {: .comment} > {: .hands_on}
Since we are interested in segmenting the nucleoli, you may wonder why we need to segment nuclei first. There are several reasons for that:
Get the nuclei features. The intensity, size, shape, number of nucleoli per nucleus, etc. can be informative to study the nucleoli.
Avoid the detection of wrong spots. In a first segmentation, we detect the nuclei, while in the second pass we will segment the holes (nucleoli). The holes need to fall inside the nuclei, and hence the importance of having them segmented too.
In the first step, we will identify the nuclei that are complete, meaning that they are not touching the borders of the image.
{% icon hands_on %} Hands-on: Segment nuclei that are complete within the boundaries of the image
- IdentifyPrimaryObjects {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of Starting Modules {% icon tool %})- “Use advanced settings?”:
Yes, use advanced settings
- “Enter the name of the input image (from NamesAndTypes)”:
DNA- “Enter the name of the primary objects to be identified”:
Nuclei- “Typical minimum diameter of objects, in pixel units (Min)”:
15- “Typical maximum diameter of objects, in pixel units (Max)”:
200- “Discard objects outside the diameter range?”:
Yes- “Discard objects touching the border of the image?”:
Yes- “Threshold strategy”:
Global
- “Thresholding method”:
Otsu
- “Two-class or three-class thresholding?”:
Two classes- “Threshold correction factor”:
0.9- “Method to distinguish clumped objects”:
Shape
- “Method to draw dividing lines between clumped objects”:
Shape
- “Automatically calculate size of smoothing filter for declumping?”:
Yes- “Automatically calculate minimum allowed distance between local maxima?”:
Yes- “Handling of objects if excessive number of objects identified”:
Continue{% icon comment %} Comment
- The name entered to the input image and objects has to be consistent (case sensitive) with the names in NamesAndTypes and the tools to follow.
- The min and max diameter of the objects (
Typical minimum diameter of objects, in pixel units (Min)andTypical minimum diameter of objects, in pixel units (Max)) will have to be adjusted to the resolution of the images. {: .comment}{: .hands_on}
{% icon question %} Questions
We are using here Otsu’s method for segmentation. What other segmentation options are available? What is the difference between them?
{% icon solution %} Solution
In the global methods we have
Manual,Measurement,Minimum cross entropy,OtsuandRobust background. For the adaptive ones we only haveOtsu. Check the parameters’ help to get more information on each one.{: .solution}
{: .question}
From the previous tool, we got a group of objects (nuclei). Now, we want to export the segmentation masks as a single image to check how well the segmentation algorithm is performing. We also want to label the nuclei with their identifiers for future visual inspection of the results. The output of this step will look like:
identified_nuclei_with_labels
{% icon question %} Questions
Why are some nuclei not labeled in the image above?
{% icon solution %} Solution
We have indicated in the tool IdentifyPrimaryObjects {% icon tool %} that the nuclei that are either outside the diameter range or touching the border should be discarded.
{: .solution}
{: .question}
{% icon hands_on %} Hands-on: Mask the nuclei detected
- ConvertObjectsToImage {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of IdentifyPrimaryObjects {% icon tool %})- “Enter the name of the input objects you want to convert to an image”:
Nuclei- “Enter the name of the resulting image”:
MaskNuclei- “Select the color format”:
Binary (black & white)- DisplayDataOnImage {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of ConvertObjectsToImage {% icon tool %})- “Display object or image measurements?”:
Object
- “Enter the name of the input objects”:
Nuclei- “Measurement category”:
Number- “Enter the name of the image on which to display the measurements”:
DNA- “Display mode”:
Text
- “Text color”:
#ff0000- “Number of decimals”:
0- “Name the output image that has the measurements displayed”:
ImageDisplay- SaveImages {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of DisplayDataOnImage {% icon tool %})- “Select the type of image to save”:
Image
- “Saved the format to save the image(s)”:
tiff- “Enter the name of the image to save”:
ImageDisplay- “Select method for constructing file names”:
From image filename
- “Enter the image name (from NamesAndTypes) to be used as file prefix”:
DNA- “Append a suffix to the image file name?”:
Yes
- “Text to append to the image name”:
_nucleiNumbers- “Overwrite existing files without warning?”:
Yes{% icon comment %} Comment
The
Text colorparameter can be any of your choice, it just needs to be visible on top of the nuclei. {: .comment} {: .hands_on}
The nucleoli are lacking intensity in the DNA staining and therefore, we need to enhance the black holes before masking.
{% icon hands_on %} Hands-on: Detect and mask dark holes
- EnhanceOrSuppressFeatures {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of SaveImages {% icon tool %})- “Enter the name of the input image”:
DNA- “Enter a name for the resulting image”:
DNAdarkholes- “Select the operation”:
Enhance
- “Feature type”:
Dark holes
- “Maximum hole size”:
15- MaskImage {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of EnhanceOrSuppressFeatures {% icon tool %})- “Enter the name of the input image”:
DNAdarkholes- “Enter the name of the resulting image”:
MaskDNAdarkholes- “Use objects or an image as a mask?”:
Objects
- “Enter the name objects to mask the input image”:
Nuclei- “Invert the mask?”:
No{: .hands_on}
Now that we have all the holes in one mask, we can segment the nucleoli as individual objects in the same way as we did with the nuclei. All the nucleoli can be then combined into one single image.
{% icon hands_on %} Hands-on: Segment nucleoli as individual objects
- IdentifyPrimaryObjects {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of MaskImage {% icon tool %})- “Use advanced settings?”:
Yes, use advanced settings
- “Enter the name of the input image (from NamesAndTypes)”:
MaskDNAdarkholes- “Enter the name of the primary objects to be identified”:
Nucleoli- “Typical minimum diameter of objects, in pixel units (Min)”:
2- “Typical maximum diameter of objects, in pixel units (Max)”:
15- “Discard objects touching the border of the image?”:
Yes- “Threshold strategy”:
Global
- “Thresholding method”:
Otsu
- “Two-class or three-class thresholding?”:
Two classes- “Threshold correction factor”:
0.9- “Method to distinguish clumped objects”:
Shape
- “Method to draw dividing lines between clumped objects”:
Shape
- “Automatically calculate size of smoothing filter for declumping?”:
Yes- “Automatically calculate minimum allowed distance between local maxima?”:
Yes- “Handling of objects if excessive number of objects identified”:
Continue- ConvertObjectsToImage {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of IdentifyPrimaryObjects {% icon tool %})- “Enter the name of the input objects you want to convert to an image”:
Nucleoli- “Enter the name of the resulting image”:
MaskNucleoli- “Select the color format”:
Binary (black & white){: .hands_on}
We have now one segmentation mask per image with all the nuclei detected, MaskNuclei, and another one for the nucleoli, MaskNucleoli. These are binary masks in which the background is black and the objects detected are white. We would like to check whether both segmentation steps went well. That could be achieved by combining both (using different colors) into one image. Here we are converting the nucleus mask to blue and the nucleoli to magenta. The outcome will look like:
combined_mask_nuclei_nucleoli
{% icon hands_on %} Hands-on: Convert and save the nuclei and nucleoli masks
- GrayToColor {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of ConvertObjectsToImage {% icon tool %})- “Enter the name of the resulting image”:
CombinedMask- “Select a color scheme”:
RGB
- “Enter the name of the image to be colored red”:
MaskNucleoli- “Relative weight for the red image”:
0.8- “Enter the name of the image to be colored blue”:
MaskNuclei- “Relative weight for the blue image”:
0.5- SaveImages {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of GrayToColor {% icon tool %})- “Select the type of image to save”:
Image
- “Saved the format to save the image(s)”:
tiff- “Enter the name of the image to save”:
CombinedMask- “Select method for constructing file names”:
From image filename
- “Enter the image name (from NamesAndTypes) to be used as file prefix”:
DNA- “Append a suffix to the image file name?”:
Yes
- “Text to append to the image name”:
_combinedMask- “Overwrite existing files without warning?”:
Yes{% icon comment %} Comment
- You can pick any other color of your choice, as long as the contrast is good enough to distinguish both objects.
- We are saving here a tiff image but any other format would work too. {: .comment}
{: .hands_on}
The background extraction is useful for quality control. For instance, in a high-exposed or low-contrast image, the nuclei won’t be very different from the background and that may lead to the wrong segmentation.
To extract the background, we first need to get the foreground and subtract it from the original image. We already have the nuclei mask, however, we excluded incomplete nuclei, i.e., those touching the borders or those with sizes outside of the specified range. This means that the mask is not covering all the nuclei and we need to get rid of the constraints to get the complete foreground. Now, we want to detect everything with a certain intensity (foreground) and subtract it from the complete image to get the background.
{% icon hands_on %} Hands-on: Segment all nuclei
- IdentifyPrimaryObjects {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of SaveImages {% icon tool %})- “Use advanced settings?”:
Yes, use advanced settings
- “Enter the name of the input image (from NamesAndTypes)”:
DNA- “Enter the name of the primary objects to be identified”:
NucleiIncludingTouchingBorders- “Typical minimum diameter of objects, in pixel units (Min)”:
15- “Typical maximum diameter of objects, in pixel units (Max)”:
200- “Discard objects outside the diameter range?”:
No- “Discard objects touching the border of the image?”:
No- “Threshold strategy”:
Global
- “Thresholding method”:
Otsu
- “Two-class or three-class thresholding?”:
Two classes- “Threshold correction factor”:
0.9- “Method to distinguish clumped objects”:
Shape
- “Method to draw dividing lines between clumped objects”:
Shape
- “Automatically calculate size of smoothing filter for declumping?”:
Yes- “Automatically calculate minimum allowed distance between local maxima?”:
Yes- “Handling of objects if excessive number of objects identified”:
Continue- ConvertObjectsToImage {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of IdentifyPrimaryObjects {% icon tool %})- “Enter the name of the input objects you want to convert to an image”:
NucleiIncludingTouchingBorders- “Enter the name of the resulting image”:
Image_NucleiIncludingTouchingBorders- “Select the color format”:
Binary (black & white){: .hands_on}
{% icon hands_on %} Hands-on: Subtract nuclei from the original image
ImageMath {% icon tool %} with the following parameters: - {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of ConvertObjectsToImage {% icon tool %}) - “Enter a name for the resulting image”:BG- “Operation”:Subtract- In “First Image”: - “Image or measurement?”:Image- “Enter the name of the first image”:DNA- In “Second Image”: - “Image or measurement?”:Image- “Enter the name of the second image”:Image_NucleiIncludingTouchingBorders- “Ignore the image masks?”:No{% icon comment %} Comment
When the operation is
Subtract, the order of the first and second images is important. {: .comment}{: .hands_on}
Now that we have the objects of interest segmented and the background extracted, we can start measuring parameters on them. In particular it is relevant to:
A step that requires special attention is the relationship nucleolus-nucleus. This is useful to compute statistics on the number of nucleoli by nucleus.
{% icon comment %} Comment
The order in which the tools are chained in this section is not relevant for the outcome. {: .comment}
{% icon hands_on %} Hands-on: Measure the granularity, texture, intensity, size and shape
- MeasureGranularity {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of ImageMath {% icon tool %})- In “new image”:
- {% icon param-repeat %} “Insert new image”
- “Enter the name of a greyscale image to measure”:
DNA- MeasureTexture {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of MeasureGranularity {% icon tool %})- In “new image”:
- {% icon param-repeat %} “Insert new image”
- “Enter the name of an image to measure”:
DNA- “Measure images or objects?”:
Objects
- In “new object”:
- {% icon param-repeat %} “Insert new object”
- “Enter the names of the objects to measure”:
Nuclei- MeasureObjectIntensity {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of MeasureTexture {% icon tool %})- In “new image”:
- {% icon param-repeat %} “Insert new image”
- “Enter the name of an image to measure”:
DNA- In “new object”:
- {% icon param-repeat %} “Insert new object”
- “Enter the name of the objects to measure”:
Nuclei- MeasureObjectSizeShape {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of MeasureObjectIntensity {% icon tool %})- In “new object”:
- {% icon param-repeat %} “Insert new object”
- “Enter the name of the object to measure”:
Nuclei- {% icon param-repeat %} “Insert new object”
- “Enter the name of the object to measure”:
Nucleoli{: .hands_on}
{% icon question %} Questions
Why are we measuring the granularity, texture and intensity of the original image and the nuclei only?
{% icon solution %} Solution
The nucleoli was not stained in the DNA channel and hence, the granularity, texture and intensity are constant values.
{: .solution}
{: .question}
It might be relevant to compute some statistics on the number of nucleoli inside each nucleus. CellProfiler has a very interesting module to relate both objects in which each one of the nucleoli is assigned an identifier and linked to the identifier of its parent nucleus.
{% icon hands_on %} Hands-on: Relate nucleoli to their parent nucleus
- RelateObjects {% icon tool %} with the following parameters:
- {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of MeasureObjectSizeShape {% icon tool %})- “Parent objects”:
Nuclei- “Child objects”:
Nucleoli- “Calculate child-parent distances?”:
Both
- “Calculate distances to other parents?”:
No- “Do you want to save the children with parents as a new object set?”:
Yes{: .hands_on}
All the parameters that we have measured related to the images and objects need to be exported to a file for each one of 6 example images analysed.
{% icon hands_on %} Hands-on: Export features
ExportToSpreadsheet {% icon tool %} with the following parameters: - {% icon param-file %} “Select the input CellProfiler pipeline”:
output_pipeline(output of MeasureImageIntensity {% icon tool %}) - “Select the column delimiter”:Tab- “Add a prefix to file names?”:Do not add prefix to the file name- “Create a GenePattern GCT file?”:No- “Export all measurement types?”:Yes{: .hands_on}
All the steps in our workflow (except for the IDR download {% icon tool %}) have been passing through an output_pipeline as a parameter. This was the way to assemble all the modules from CellProfiler, now we can run all of them together!
{% icon hands_on %} Hands-on: Run CellProfiler pipeline
CellProfiler {% icon tool %} with the following parameters: - {% icon param-file %} “Pipeline file”:
output_pipeline(output of ExportToSpreadsheet {% icon tool %}) - “Are the input images packed into a tar archive?”:Yes- {% icon param-file %} “A tarball of images”:output_tar(output of IDR Download {% icon tool %}) - “Detailed logging file?”:Yes{% icon comment %} Comment
This is the only time-consuming step of the workflow, as it needs to perform all the analysis in the input dataset. {: .comment}
{: .hands_on}
{:.no_toc}
In this tutorial, you have downloaded images from a public image repository into your Galaxy history. After that, you have built and run a typical image analysis pipeline, composed of segmentation of several objects and feature extraction. As an outcome, you got plenty of features to analyse! And some masks to check that the segmentation algorithms worked as expected. Now you are ready to perform your biological data analysis!